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Abstract. We describe an algorithm for compressing a partially ordered 
set, or poset, so that it occupies space matching the information theory 
lower bound (to within lower order terms), in the worst case. Using this 
algorithm, we design a succinct data structure for representing a poset 
that, given two elements, can report whether one precedes the other in 
constant time. This is equivalent to succinctly representing the transitive 
closure graph of the poset, and we note that the same method can also 
be used to succinctly represent the transitive reduction graph. For an n 
element poset, the data structure occupies n 2 /4 + o(n 2 ) bits, in the worst 
case, which is roughly half the space occupied by an upper triangular 
matrix. Furthermore, a slight extension to this data structure yields a 
succinct oracle for reachability in arbitrary directed graphs. Thus, using 
roughly a quarter of the space required to represent an arbitrary directed 
graph, reachability queries can be supported in constant time. 

1 Introduction 

Partially ordered sets, or posets, are useful for modelling relationships between 
objects, and appear in many different areas, such as natural language processing, 
machine learning, and database systems. As problem instances in these areas 
are ever-increasing in size, developing more space efficient data structures for 
representing posets is becoming an increasingly important problem. 

When designing a data structure to represent a particular type of combina- 
torial object, it is useful to first determine how many objects there are of that 
type. By a constructive enumeration argument, Kleitman and Rothschild 
showed that the number of n element posets is 2™ / 4 +°("). Thus, the informa- 
tion theoretic lower bound indicates that representing an arbitrary poset requires 
lg(2™ 2 / 4+ °(™)) = n 2 /4 + 0(n) bitif] This naturally raises the question of how 
a poset can be represented using only n 2 /4 + o(n 2 ) bits, and support efficient 
query operations. Such a representation, that occupies space matching the in- 
formation theoretic lower bound to within lower order terms while supporting 
efficient query operations, is called a succinct data structure [5]. 

The purpose of this paper is to answer this question by describing the first 
succinct representation of arbitrary posets. We give a detailed description of our 
results in Section [4j but first provide some definitions in Section [2] and then 
highlight some of the previous work related to this problem in Section [3] 

* This research was funded in part by NSERC of Canada, and the Canada Research 

Chairs program. 
1 We use lg n to denote [log 2 n] . 



2 Definitions 



A poset P, is a reflexive, antisymmetric, transitive binary relation ^ on a set 
of n elements S, denoted P = (S,^). Let a and b be two elements in S. If 
a ^ 6, we say a precedes b. We refer to queries of the form, "Does a precede 
6?" as precedence queries. If neither a -< b or b -< a, then we say a and b are 
incomparable. For convenience we write a -< 6 if a ^ 6 and a =/= b. 

Each poset P = (S, if!) is uniquely described by a directed acyclic graph, or 
ZMG, G c = (S, E c ), where E c = {(a, 6) : a -<! 6} is the set of edges. The DAG G c 
is the transitive closure graph of P. Note that a precedence query for elements 
a and b is equivalent to the query, "Is the edge (a, b) in P c ?" Alternatively, let 
G r = (S,E r ) be the DAG such that E r = {(a, 6) : a -< b,$ ceS ,a -< c -< 6}, i.e., 
the minimal set of edges that imply all the edges in P c by transitivity. The DAG 
G r also uniquely describes P, and is called the transitive reduction graph of P. 

Posets are also sometimes illustrated using a Hasse diagram, which displays 
all the edges in the transitive reduction, and indicates the direction of an edge 
(a, b) by drawing element a above b. We refer to elements that have no outward 
edges in the transitive reduction as sinks, and elements that have no inward edges 
in the transitive reduction as sources. See Figure[T]for an example. Since all these 
concepts are equivalent, we may freely move between them when discussing a 
poset, depending on which representation is the most convenient. 




Fig. 1. A Hasse diagram of a poset (left), the transitive reduction (centre), and the 
transitive closure (right). Elements a and b are sources, and elements g and / are sinks. 

A linear extension L = {ai, ...,a„} is a total ordering of the elements in S 
such if ai -< aj for some i ^ j, then i < j. However, note that the converse is 
not necessarily true: we cannot determine whether -< a,j unless we know that 
at and <Zj are comparable elements. A chain of a poset, P = (S,^), is a total 
ordering C = {a 1; ...,afc} on a subset of k elements from S such that dj -< aj iff 
i < j, for 1 < i < j < k. An antichain is a set A — {a\, au} that is a subset of 
k elements from S, such that each et^ and aj are incomparable, for 1 < i < j < k. 
The height of a poset is the size of its maximum length chain, and the width of 
a poset is the size of its maximum antichain. 
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For a graph G = (V, E), we sometimes use E(H) to denote the set of edges 
{(a, b) : (a,b) e E,a e H,b e H}, where H C V. Similarly, we use G(H) to 
denote the subgraph of G induced by H, i.e., the subgraph with vertex set H and 
edge set E(H). Finally, if (a, b) £ E, or (b, a) € E, we say that b is a neighbour 
of a in G. 

3 Previous work 

Previous work in the area of succinct data structures includes representations of 
arbitrary undirected graphs [5] , planar graphs pQ , and trees [13] . There has also 
been interest in developing reachability oracles for planar directed graphs [18 , 
as well as approximate distance oracles for undirected graphs [TU] . For restricted 
classes of posets, such as lattices [17] and distributive lattices [7] , space efficient 
representations have been developed, though they are not succinct. 

One way of storing a poset is by representing either its transitive closure 
graph, or transitive reduction graph, using an adjacency matrix. If we topologi- 
cally order the vertices of this graph, then we can use an upper triangular matrix 
to represent the edges, since the graph is a DAG. Such a representation occupies 
(o) bits, and can, in a single bit probe, be used to report whether an edge exists 
in the graph between two specified elements. Thus, using this simple approach 
we can achieve a space bound that is roughly two times the information theory 
lower bound for representing a poset. An alternative representation, called the 
ChainMerge structure was proposed by Daskalakis et al. [4], that occupies 0(nw) 
words of space, where w is the width of the poset. The ChainMerge structure, 
like the transitive closure graph, supports precedence queries in O(l) time. 

Recently, Farzan and Fischer [5] presented a data structure that represents 
a poset using 2nw(l + o(l)) + (1 + e)n\gn bits, where w is the width of the 
poset, and e > is an arbitrary positive constant. This data structure supports 
precedence queries in O(l) time, and many other operations in time proportional 
to the width of the poset. These operations are best expressed in terms of the 
transitive closure and reduction graphs, and include: reporting all neighbours of 
an element in the transitive closure in 0(w + k) time, where k is the number 
of reported elements; reporting all neighbours of an element in the transitive 
reduction in 0(w 2 ) time; reporting an arbitrary neighbour of an element in the 
transitive reduction in 0{w) time; reporting whether an edge exists between two 
elements in the transitive reduction in 0{w) time; reporting all elements that, 
for two elements a and b, are both preceded by a and precede b in 0(w + k) time; 
among others. The basic idea of their data structure is to encode the ChainMerge 
structure of Daskalakis et al. [4] using bit sequences, and answer queries using 
rank and select operations on these bit sequences. 

Since the data structure of Farzan and Fischer [5] is adaptive on width, it is 
appropriate for posets where the width is a slow-growing function of n. However, 
if we select a poset of n elements uniformly at random from the set of all possible 
n element posets, then it will have width n/2 + o(n) with high probability |llj . 
Thus, this representation may occupy as many as n 2 +o{n 2 ) bits, which is roughly 
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four times the information theory lower bound. Furthermore, with the exception 
of precedence queries, all other operations take linear time for such a poset. 

4 Our Results 

Our results hold in the word-RAM model of computation with word size 0(\g n) 
bits. Our main result is summarized in the following theorem: 

Theorem 1. Let P = (S, -<) be a poset, where \S\ = n. There is a succinct data 
structure for representing P that occupies n 2 /A + 0((n 2 lglgn)/lgn) bits, and 
can support precedence queries in 0(1) time: i.e., given two elements a, b € S, 
report whether a ■< b. 

The previous theorem implies that we can, in O(l) time, answer queries of 
the form, "Is the edge (a, b) in the transitive closure graph of P?" In fact, we 
can also apply the same representation to support, in 0(1) time, queries of the 
form, "Is the edge (a, b) in the transitive reduction graph of P?" However, at 
present it seems as though we can only support efficient queries in one or the 
other, not both simultaneously. For this reason we focus on the closure, since it 
is likely more useful, but state the following theorem: 

Theorem 2. Let G r = (S, E r ) be the transitive reduction graph of a poset, where 
\S\ = n. There is a succinct data structure for representing G r that occupies 
n 2 /A + 0((n 2 lglgn)/ lg n) bits, and, given two elements a,b € S, can report 
whether (o, b) £ E r in 0(1) time. 

Reachability in Directed Graphs : For an arbitrary DAG, the reachability rela- 
tion between vertices is a poset: i.e., given two vertices, a and b, the relation 
of whether there a directed path from a to b in the DAG. As a consequence, 
Theorem [l] implies that there is a data structure that occupies n 2 /4 + o(n 2 ) 
bits, and can support reachability queries in a DAG, in O(l) time. We can even 
strengthen this observation by noting that for an arbitrary directed graph G, the 
condensation of G — the graph that results by contracting each strongly con- 
nected component into a single vertex Section 22.5] — is a DAG. Given two 
vertices a and b, if a and b are in the same strongly connected component, then 
b is reachable from a. Otherwise, we can apply Theorem [I] to the condensation 
of G. Thus, we get the following corollary: 

Corollary 1. Let G be a directed graph. There is a data structure that occupies 
n 2 /A + o(n 2 ) bits and, given two vertices of G, a and b, can report whether b is 
reachable from a in O(l) time. 

Note that the space bound of the previous corollary is roughly a quarter of the 
space required to represent an arbitrary directed graph! Switching back to the 
terminology of order theory, the previous corollary generalizes Theorem [T] to the 
larger class of binary relations known as quasi-orders: i.e., binary relations that 
are reflexive and transitive, but not necessarily antisymmetric. In fact, reflexivity 
does not restrict the binary relation very much, so we can further generalize 
Theorem[l]to arbitrary transitive binary relations; we discuss this in Appendix [A} 
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Overview of the data structure: The main idea behind our succinct data struc- 
ture is to develop an algorithm for compressing a poset so that it occupies space 
matching the information theory lower bound (to within lower order terms), in 
the worst case. The main difficulty is ensuring that we are able to query the 
compressed structure efficiently Our first attempt at designing a compression 
algorithm was essentially a reverse engineered version of an enumeration proof 
by Kleitman and Rothschild [10]. However, though the algorithm achieved the 
desired space bound, there was no obvious way to answer queries on the com- 
pressed data due to one crucial compression step. Though there are several other 
enumeration proofs (cf., |f 112) ). they all appeal to a similar strategy, making 
the compressed data difficult to query. This led us to develop an alternate com- 
pression algorithm, that uses techniques from extremal graph theory. 

We believe it is conceptually simpler to present our algorithm as having two 
steps. In the first step, we preprocess the poset, removing edges in its transitive 
closure graph, to create a new poset where the height is not too large. We refer 
to what remains as a flat poset. We then make use of the fact that, in a flat 
poset, either balanced biclique subgraphs of the transitive closure graph — con- 
taining J7(lg nj lg lg n) elements — must exist, or the poset is relatively sparsely 
connected. In the former case, the connectivity between these balanced biclique 
subgraphs and the remaining elements is shown to be space efficient to encode 
using the fact that all edges implied by transitivity are in the transitive clo- 
sure graph. In the latter case, we can directly apply techniques from the area of 
succinct data structures to compress the poset. 

5 Succinct Data Structure 

In this section we describe a succinct data structure for representing posets. In 
order to refer to the elements in the poset, we assume each element has a label. 
Since our goal is to design a data structure that occupies rt 2 /4 + o(n 2 ) bits, we 
are free to assign arbitrary 0(lgn)-bit labels to the elements, as such a labeling 
will require only O(nlgn) bits. Thus, we can assume each element in our poset 
has a distinct integer label, drawn from the range [l,n]. Our data structure 
always refers to elements by their labels, so often when we refer to "element" a, 
it means "the element in S with label a", depending on context. 

5.1 Preliminary Data Structures 

Given a bit sequence B[l..n], we use access(i?, i) to denote the i-th bit in B, and 
rank(5, i) to denote the number of 1 bits in the prefix B[l..i}. We make use of the 
following lemma, which can be used to support access and rank operations on 
bit sequences, while compressing the sequence to its Oth-order empirical entropy. 

Lemma 1 (Raman, Raman, Rao [16j). Given a bit sequence B of length 
n, of which f3 bits are 1, there is a data structure that can represent B using 
lg (J) + 0(n lglgn/lgn) bits that can support the operations access, and rank 
on B in O(I) time. 
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5.2 Flattening a Poset 



Let 7 > be a parameter, to be fixed later; the reader would not be misled by 
thinking that we will eventually set 7 = lgn. We call a poset "/-flat if it has 
height no greater than 7. In this section, we describe a preprocessing algorithm 
for posets that outputs a data structure of size 0(n 2 /j) bits, that transforms a 
poset into a 7- flat poset, without losing any information about its original struc- 
ture. After describing this preprocessing algorithm, we develop a compression 
algorithm for flat posets. Using the preprocessing algorithm together with the 
compression algorithm yields a succinct data structure for posets. 

Let P = (S, ^x) be an arbitrary poset with transitive closure graph G c = 
(S,E C ). We decompose the elements of S into antichains based on their height 
within P. Let H(P) denote the height of P. All the sources in S are of height 
1, and therefore are assigned to the same set. Each non-source element a £ S is 
assigned a height equal to the length of the maximum path from a source to a. 
We use Uh to denote the set of all the elements of height h, 1 < h < H(P), and 
U to denote the set {t/i, f-H(p)}. Furthermore, it is clear that each set, Uh, is 
an antichain, since if a -< b then the height of b is strictly greater than a. 

Next, we compute a linear extension C of the poset P in the following way, 
using U. The linear extension C is ordered such that all elements in U come 
before Ui+\ for all 1 < i < H(P), and the elements within the same U are 
ordered arbitrarily within C. Given any subset S' C S, we use the notation 
S'(x) to denote the element ranked x-th according to C, among the elements 
in the subset S' . We illustrate these concepts in Figure [2] Later, this particular 
linear extension will be used extensively, when we output the structure of the 
poset as a bit sequence. 




S' 



Ui 


= {a,b} S'{1) = 


b 




= {c,d} S'(2) = 


a 


u 3 


= {ej} S> (3) = 


c 


Ui 


= {9} S' (4) = 


f 


C -- 


= {b,a,c, d, f,e,g} 





Fig. 2. The antichain decomposition of the poset from Figure [T] The set S' is the set 
of elements surrounded by the dotted line. Note that C is only one of many possible 
linear extensions. 



We now describe a preprocessing algorithm to transform an arbitrary poset 
P into a 7-flat poset P. We assume P is not 7- flat, otherwise we are done. Given 
two consecutive antichains Ui and E/i+i, we define a merge step to be the opera- 
tion of replacing Ui and U+i by a new antichain U[ = Ui U C/^+i, and outputting 
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and removing all the edges between elements in Ui and t/,+i in the transitive 
closure of P, i.e., EjJJi U J7,+i). We say that E/j+i is the upper antichain, Ui is 
the lower antichain, and refer to the new antichain U- as the merged antichain. 
Each antichain £/j where j > z + 1 becomes antichain Uj_ 1 in the residual decom- 
position, after the merge step. To represent the edges, let B be a bit sequence, 
storing |£7j||E/j + i| bits. The bit sequence B is further subdivided into sections, 
denoted B x , for each x £ [1, \Ui\], where the bit B x [y] represents whether there 
is an edge from Ui(x) to J7 i+1 (y); or equivalently, whether Ui(x) ~< Ui + i(y). 
We say that antichain C/^+i is associated with B, and vice versa. The binary 
string B is represented using the data structure of Lemma [I] which compresses 
it to its Oth-order empirical entropjj^J Note that, after the merge step, the ele- 
ments in merged antichain U[ are ordered, in the linear extension C, such that 
U<(x) = Ui(x) for 1 < x < \Ui\ and Ufa + |^|) = U l+1 (y) for 1 < y < \U i+1 \. 



Algorithm Flatten (U, i): where i is the index of an antichain in U. 

if i > \U\ then 

Exit 
end if 

if \Ui\ + \U i+ i\ < 2n/7 then 

Perform a merge step on Ui and Ui+i 
else 

i «— i + 1 
end if 

Flatten (U, i) 



There are many possible ways that we could apply merge steps to the poset in 
order to make it 7- flat. The method we choose, presented in algorithm Flatten, 
has the added benefit that accessing the output bit sequences is straightfor- 
ward. Let U be the residual antichain decomposition that remains after execut- 
ing Flatten(W, 1), and P be the resulting poset. The number of antichains in 
U is at most 7, and therefore the remaining poset P is 7-fl.at. We make the 
following further observation: 

Lemma 2. Flatten(W, 1) outputs 0(n 2 /j) bits. 

Proof. Consider the decomposition U and let m = H(P) — \U\. Let m,...,n m 
denote the number of elements in Ui,...,U m , and n s , t to denote J2i=s n i- We 
use the fact that the expression J2l=l ((Sj= s n j) n i+i) — n s.t( n s.t — l)/2, where 
1 < s < t < m; we include a proof in Appendix [B] For each of the at most 7 
antichains in IA, the previous inequality implies that Flatten outputs no more 
than 0(n 2 s t ) bits, where n s , t — 0(71/7). Thus, overall the number of bits output 
during the merging steps is 0((n/j) 2 j) = 0(n 2 /j). □ 



2 We note that for our purposes in this section, compression of the bit sequence is 
not required to achieve the desired asymptotic space bounds. However, the fact that 



Lemma [T] compresses the bit sequence will indeed matter in Section 5.3 
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We now show how to use the output of the merge steps to answer connectivity 
queries for edges that were removed by the Flatten algorithm: 

Lemma 3. There is a data structure of size 0(n 2 /^f) bits that, given two ele- 
ments a and b can determine in O(l) time whether a precedes b, if both a and b 
belong to the same antichain in the residual antichain decomposition hi. 

Proof. We add additional data structures to the output of Flatten in order 
to support queries. Since the labels of elements in S are in the range [1, n], we 
can treat elements as array indices. Thus, it is trivial to construct an 0{n\gn) 
bit array that, given elements a,b £ S, returns values ,j,f ,x,x' ,y and y' 
in O(l) time such that Ui(x) = a, Uj(y) — b, Ui'(x') — a, Uj'(y') = b, where 
Ui, Uj £ U and Ui>, Uj> £ U. We also store an array A containing \U\ records. For 
each antichain Ui £ U, if Ui is the upper antichain during a merge sterj^J then: 
A[i].pnt points to the start of the sequence, B, associated with [/,-, and; A[i].len 
stores the length of the lower antichain. Recall that after the merge step, the 
element Ui(x) has rank x + ^4[z].len in the merged antichain. Thus, A[i].len is 
the offset of the ranks of the elements of Ui within the merged antichain. These 
extra data structures occupy O(nlgn) bits and are dominated by the size of the 
output of Flatten, so the claimed space bound holds by Lemma [2] 

We now discuss how to answer a query. Given a,b £ S, if i! ^ j' , then we re- 
turn "different antichains". Otherwise, if i — j, then we return "no". Otherwise, 
assume without loss of generality that i > j. Thus, Ui is the upper antichain, 
and A[i].pnt is a pointer to a sequence B, whereas Uj is a subset of the lower 
antichain Uk, and A[j].len is the offset of the elements in Uj within Uk- Let 
z = y + A[j].len, and return "yes" if B z [x] = 1 and "no" otherwise. Section B z 
begins at the ((z — l)|J7 i |)-th bit of B so we can access B z [x] in O(l) time. □ 

5.3 Compressing Flat Posets 

In this section we describe a compression algorithm for flat posets that, in the 
worst case, matches the information theory lower bound to within lower order 
terms. We begin by stating the following lemma, which is a constructive deter- 
ministic version of a well known theorem by Kovari, Sos, and Turan |12) : 

Lemma 4 (Mubayi and Turan [13j). There is a constant c m i n such that, 
given a graph with \V\ > c m i n vertices and \E\ > 8|V| 3 / 2 edges, we can find a 
balanced biclique K q ^ q , where q — 6*(lg \V\/ lg(\V\ /\E\)), in time 0(\E\). 

Let P be a (lgn)-flat poset, G c — (S,E C ) be its transitive closure, and U = 
{Ui, U m } be its antichain decomposition (discussed in the last section), which 
contains m < lgn antichains. We now prove our key lemma, which is crucial for 
the compression algorithm. 

3 Note that, with the exception of the first merge step, Ui £ U is not the i-th antichain 
in the decomposition when the merge step occurs, but we will store records for the 
index i rather than some intermediate index. 
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Lemma 5 (Key Lemma). Consider the subgraph Gr — G c (UiUUi+i) for some 
1 < i < m, and ignore the edge directions so that Gr is undirected. Suppose Gr 
contains a balanced biclique subgraph with vertex set D, and \D\ = r. Then there 
are at most 2 T / 2+1 — 1 ways that the vertices in D can be connected to each vertex 
inS\(UiUU i+1 ). 

Proof. Each vertex ueS \ (Ui U CA+i) is in Uj, where, either j > i + 1 or j < i. 
Without loss of generality, consider the case where j > i + If v is connected to 
any vertex u £ D n f/j+i, then v is connected to all vertices in D n Ui. Thus, v 
can be connected to the vertices in D n t/j+i in 2 T / 2 — 1 ways, or to the vertices 
in D n Ui in 2 T / 2 - 1 ways, or not connected to D at all. In total, there are 
2 T / 2 + 1 _ 1 ways to connect v to D. □ 



Algorithm Compress-Flat(P, n, U, m): where P = (5,^) is a (lgn)-flat 
poset of n < n elements, and U = {Ui, J7m} is a decomposition of the elements 
in P into rh antichains. 

1: if rh = 1 then 

2: EXIT 

3: else if \Ui U t/i+il> c m i n and |.E c (Z7i U > (n/ lgn) 2 , for an i g [1, m] then 

4: Apply Lemma UI to the subgraph G c (Ui U Ui+i). This computes a balanced bi- 
clique with vertex set D C Ui U [/j+i such that r = |D| = J7(lg n/ lg lgn). 

5: For each element b £ Ui H D output a bit sequence of |J7i+i| bits, where 
W~[k] = 1 iSb<U i+1 (k). 

6: For each element a £ (7i+i PI D output a bit sequence Wa of [C7j| bits, where 
W+[k] = lffiUi{k) <a. 

7: Let _ff = S\(UiUU i+ i). Output an array of integers Y, where Y[k] G [0, 2 t/2+1 -1] 
and indicates how i?(fc) is connected to D (see Lemma[5|. 

8: Set Ui^Ui\D 

9: Set U t+1 «- f/ i+ i \ L> 
10: Compress-Flat(P \ D, n — T,U,rh) 
11: else 

12: Perform a merge step on J7i and f/2 

13: Set rh m — 1 

14: Compress-Flat(P, h,U,m) 

15: end if 



Consider the algorithm Compress-Flat. The main idea is to repeatedly 
apply Lemma [4] to two consecutive antichains the antichain decomposition that 
have many edges — defined on line [3] — between them in the transitive closure 
graph. If no such antichains exist, then we apply merge steps. The algorithm 
terminates when only one antichain remains. We refer to the case on lines |4] |10| 
as the dense case, and the case on lines 12|T4 as the sparse case. We now prove 
that the size of the output of the compression algorithm matches the information 
theory lower bound to within lower order terms. 
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Lemma 6. The output o/Compress-Flat(P, n, U, m) is no more than n 2 /4 + 
0((n 2 lglgn)/ lgn) bits. 

Proof (Sketch). In the base case (line [2]), the lemma trivially holds since nothing 
is output. Next we give the intuition to show that the total output from all 
the sparse cases cannot exceed 0((n 2 lglgn)/ lgn) bits. Recall that the repre- 
sentation of Lemma jlj compresses to lg|~Q)] + 0{t lglgf/lgt) bits, where t is 
the length of the bit sequence, and /3 is the number of 1 bits. We use the fact 
that lg|"Ql < /31g(et//3) + 0(1) Section 4.6.4]. For a single pass through 
the sparse case, the total number of bits represented by B is t = 0(n 2 ), and 
f3 = 0((n/ lgn) 2 ) bits are l's. Thus, the first term in the space bound to rep- 
resent B using Lemma [I] (applying the inequality) is 0((n 2 lglgn)/lg 2 n) bits. 
Since we can enter the sparse case at most lgn times before exiting on line[2j the 
total number of bits occupied by the first term is bounded by 0((n 2 lg lg n) /lgn). 
To ensure the second term (0(t lg lg tj lg t)) in the space bound of Lemma[l]does 
not dominate the cost, we use the standard technique of applying Lemma [T] to 
the concatenation of all the bit sequences output in the sparse case, rather than 
each individual sequence separately (see Appendix [C] for more details) . 

We now prove the lemma by induction for the dense case. Let S(n) denote 
the number of bits output by ComPRESS-Flat(P, n, U, m). Inductive step: We 
can assume S(n Q ) < nQ/4 + co(riQlglgrto)/lg«o for all 1 < n < n, where n > 2, 
and Co > is some sufficiently large constant. All the additional self-delimiting 
information — for example, storing the length of the sequences output on lines 
5-7 — occupies no more than c\ lg n bits for some constant c\ > 0. Finally, recall 
that r > C2 lgn/ lglgn for some constant c-i > 0. We have: 

S(n) = ^(\U l \ + \U ] \) + (n-(\U t \ + \U J \))lg(2 T / 2+1 ) + c 1 lgn + S(n-T) 



< C- + l)n + ci lgn + \ (n 2 - 2nr + r 2 ) + c ° lsls " ( n 2 _ 2nr 

2 4 v ' lg(n — t) v 

n 2 c n 2 lg lg n 

< c 3 n + — + — c 4 n (c 4 < c c 2 , c 3 > 1) 

4 lg(n - r) 

n 2 c n 2 lglgn 

< -r + -Yl r- - c 5 n (c 5 = c 4 - c 3 ) 

4 lg(n - t) 



t 2 ) 



Note that through our choice of Cq and c 3 , we can ensure that C5 is a positive 
constant. If lg(n— r) = lgn, then the induction step clearly holds. The alternative 
case can only happen when n is greater than a power of 2, and n — r is less than 
a power of two, due to the ceiling function on lg. Thus, the alternative case 
only occurs once every 0(n/ lgn) times we remove a biclique, since each biclique 
contains O(lgn) elements. By charging this extra cost to the rightmost negative 
term, the induction holds. □ 

We now show how to support precedence queries on a (lg n)-flat poset. As 
in the previous section, if element a is removed in the dense case, we say a is 
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associated with the output on lines 6-9. Similarly, for each antichain Ui S U 
involved in a merge step as the upper antichain in the sparse case, we say that 
U is associated with the output of that merge step, and vice versa. 

Lemma 7. Let P be a (lgn)-flat poset on n elements, with antichain decomposi- 
tion U — {Ui, U m }. There is a data structure of size n 2 /A+0((n 2 lg lg n)/lg n) 
bits that, given two elements a and b, can report whether a precedes b in 0(1) 
time. 

Proof (Sketch). We augment the output of Compress-Flat with additional 
data structures in order to answer queries efficiently. Let Dq be an empty set. 
We denote the first set of elements removed in a dense case as A , the second set 
as A and so on. Let A denote the last set of elements removed in a dense case, 
for some r = 0(nlglgn/lgn). Let S t = S/(ufZo A), for 1 < £ < r + 1. We define 
Mg(x) to be the number of elements a e Si such that S(y) = a, and y < x. We 
now discuss how to compute Mg(x) in O(l) time using a data structure of size 
0(n 2 lg lg nj lgn) bits. Define M' e to be a bit sequence, where M^[x] = 1 iff S(x) e 
Si, for x g [1, nj. We represent M' e using the data structure of Lemma[lj for 1 < 
£ < r + 1. Overall, these data structures occupy 0{n 2 lg lg nj lg n) bits, since r = 
0((nlglgn)/lgn), and each binary string occupies 0(n) bits by Lemma [I] To 
compute Mg(x) we return ranki(M^, x), which requires O(l) time by Lemmajlj 
By combining the index just described with techniques similar in spirit to those 
used in Lemma [3j we can support precedence queries in O(l) time. The idea is 
to find the output associated with the query elements, and find the correct bit in 
the output to examine using the index just described; the details can be found 
in Appendix |Dj 

Theorem [l] follows by combining Lemmas [3] (with 7 set to lgn) and[7| 
6 Concluding remarks 

In this paper we have presented the first succinct data structure for arbitrary 
posets. For a poset of n elements, our data structure occupies n 2 /4 + o(n 2 ) bits 
and can support precedence queries in O(l) time. This is equivalent to supporting 
0(1) time queries of the form, "Is the edge (a, 6) in the transitive closure graph 
of P?" 

Our first remark is that if we want to support edge queries on the transitive 
reduction instead of the closure, a slightly simpler data structure can be used. 
The reason for this simplification is that for the transitive reduction, our key 
lemma does not require the antichains containing the biclique to be consecutive, 
and, furthermore, we can "flatten" the transitive reduction in a much simpler 
way than by using Lemma [3] We defer additional details to the full version. 
Our second remark is that, in terms of practical behaviour, there are alternative 
representations of bit sequences that support our required operations efficiently 
(though not O(l) time), and have smaller lower order terms in their space bound 
(e.g., [E]). In practice, using these structures would reduce the lower order terms 
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significantly. Finally, we remark that we can report the neighbours of an arbi- 
trary element in the transitive closure graph efficiently, without asymptotically 
increasing the space bound of Theorem [T] This is done by encoding the neigh- 
bours using a bit sequence, if there are few of them, and checking all n — 1 
possibilities via queries to the data structure of Theorem [l] if there are many. 
We defer the details until the full version. 
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A Generalization to Transitive Binary Relations 

In this section we discuss how to generalize Theorem [l] to transitive binary 
relations. We make use of some notation described in Section[5j so we recommend 
reading that section first. 

Theorem 3. Let T = (S,^) be a transitive binary relation -< on a set of ele- 
ments S, where \S\ — n. There is a succinct data structure for representing T 
that occupies n 2 /4 + 0((n 2 lglgn)/lgn) bits, and can support precedence queries 
in O(l) time: i.e., given two elements a,b € S, report whether a -< b. 

Proof. Given a transitive binary relation, T = (S, we store a bit sequence B, 
where B[i] = 1 iff S(i) ^ S(i). Thus, by using n bits, we can report whether a -< a 
in 0(1) time, for any a £ 5. At this point, we define a quasiorder Q — (5,^'), 
where a b iff a -< b, for all distinct elements a, b € S. We represent the Q 
using Corollary [l] Given a.b £ S, if a = b, and S(i) — a, then we query B 
and report "yes" iff B[i] = 1, otherwise, we query the representation of Q to 
determine whether a precedes b. □ 



B Proof of inequality used in Lemma [2] 

The inequality is proved by induction on t, fixing s = 1 (since the actual value 
of s is irrelevant). Base case: t = 2 holds since (n\ + rizjfoi + ri2 — l)/2 > n\U2 
for all integers n\,n% > 1. Inductive step: Assume the inequality holds for all 
2<t a <t.We have: 




2 



2 2 2 2 

'n M -l\ n t (n t -l) 



< ni 



2 

Which completes the proof. 
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C Extra Details for Lemma |6] 



In order to achieve 0((n 2 lglgn)/ lg n) bits for the sparse case, we need to use 
the standard trick in succinct data structures of concatenating all of the bit 
sequences output during the merge steps into one long bit sequence, before ap- 
plying Lemma[T]to the sequence. Note that we can still perform rank operations 
on an arbitrary range [tci,^] of this concatenated sequence, by adjusting our 
search to take into account the number of Is in the prefix [1, aci — 1]. Since this 
can be computed using a single rank operation, it does not affect the time re- 
quired to perform rank operations. By storing this concatenated sequence in the 
data structure of Lemma [T] we guarantee that the lower order term in the space 
bound will not dominate the space bound. By the same analysis presented in 
Lemma [2] the length of the concatenated bit sequence will be 0(n 2 ) bits. Thus, 
the size of the lower order terms will be 0((n 2 lglgn)/ lgn) bits. 

D Proof of Lemma [7] 

We augment the output of Compress-Flat with additional data structures in 
order to answer queries efficiently. Let Do be an empty set. We denote the first 
set of elements removed in a dense case as Di, the second set as D2 and so 
on. Let D r denote the last set of elements removed in a dense case, for some 
r = 0(n lglgn/ lgn). Let S e = S/O-J^A), for 1 < I < r + 1. We define M t (x) 
to be the number of elements a € Si such that S(y) = a, and y < x. We now 
discuss how to compute M^(x) it in O(l) time using a data structure of size 
0(n 2 lg lg n/ lgn) bits. Define M' t to be a bit sequence, where M^[x] = 1 iff 
S{x) G Si, for x £ [1,ti]. We represent M[ using the data structure of Lemma [I] 
for 1 < £ < r + 1. Overall, these data structures occupy 0(n 2 lg lg n/ lgn) bits, 
since r = 0((n lglgn)/ lgn) bits, and each binary string occupies O(n) bits by 
Lemma[lj To compute Mi{x) we return ranki(M^, x), which requires O(l) time 
by Lemma [l] 

Consider an element a removed during the dense case as part of the biclique 
-Dfc. When we refer to o we will often reference the antichains [/, and Ui+x 
such that £>fc c Ui U Ui+i (see line 6). Note that the indices i and i + 1 do 
not necessarily correspond to the indices of antichains in the initial antichain 
decomposition, U. We store an array C, where: 

— C[a]. id is the value k such that a £ D^, or 00 if a was not removed; 

— C [a]. rank is the value x such that Dk{x) = a; 

— C[a\. top is a bit indicating whether a was in CA+i, when Dk was removed; 

— C[a]. pnt is a pointer to the output associated with a, W~, , and Y; 

— C[a].ds the number of elements with rank less than a in [/, U Ui+±; 

— C[a].dt the number of elements with rank greater than a in Ui U J7,+i. 

Similar in spirit to Lemma [3j we store an O(nlgn) bit array that in 0(1) 
time, for elements a and b returns i,j,x and y such Ui, Uj € U, Ui(x) = a, and 
Uj (y) = b. Note that in this case, the indices do correspond to the indices of the 
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antichains in the initial antichain decomposition U. We also store an array A of 
records, where, for each antichain Uj G U, if Uj was the upper antichain in a 
merge step during a sparse case: 

— A[j].pnt points to the beginning of the sequence, B, associated with Uj, or 
null if no sequence is associated with Uj ; 

— A[j). delta stores the value £ such that the merge step occurred after the 
element set Di_i was removed, and before Dg was removed. 

Finally, we store an array of partial sums F, where F[i] = 5Zl=i \ Uk\- All 
these additional data structures occupy 0((n 2 lg lg n)j lgn) bits, so the claimed 
space bound holds by Lemma [6] 

Query Algorithm: If i = j, then we return "no". Otherwise, we assume, without 
loss of generality, i > j. There are several cases: 

1. If C[a]. id = C[&].id and C[a]. id ^ oo, then: 

(a) If C[a\. top C[6].top, then report "y es '\ since there must be an edge 
between a and b in the removed biclique. 

(b) Otherwise, use ^4[z].pnt to locate the bit sequence B, let £ = A[i}. delta, 
and z = M t {F[j]+y). We report "yes" if B z [Mz{F\i] +x)] = 1 and "no" 
otherwise. 



2. If C[a\. id = C[fe].id = oo, then the procedure is similar to case lb 

3. If C[a\. id < C[6].id, then let £ = C[a]. id. 

(a) If A[j].top = 1 and Me(F[i] + x) - C[a\. ds < M e {F[j]+y), then consider 
the binary string W£ , that we can locate using C[a] .pnt. If A[j]. delta > 
£, then bit W+[M e (F[j] + y) - M e (F[j})} indicates whether there is an 
edge from a to b. Otherwise, we check bit W-^M^i^j] + y)]. 

(b) If A[i].top = and M e (F[i] +x)- C'[a].ds -1 = 0, then the bit we want 
to examine was output during a merge step, and we handle this as in 
case [lb] 

(c) Otherwise, consider the sequence of integers, Y, that we can locate using 
C[a\. pnt. By examining Y^M^i^lj] +y)] and C[a].rank we can determine 
whether there is an edge from a to b in 0(1) timtQ 

4. If C[6].id < C[a}. id, then let I = C[b].±d. 

(a) If A[i]. delta < £, then the bit we want to examine was output during a 
merge step, and we handle this as in case |lb| 

(b) If S[i].top = 0, and M e (F\j] + y) + C[6].dt > M e (F[i\ + y), then con- 
sider the binary string WC , that we can locate using C[6].pnt. Let 
z = M t (F[i] + x) - M e (F[i]). The bit W^[z] indicates whether there 
is an edge from a to b. 

(c) Otherwise, consider the sequence of integers Y, that we can locate using 
C[6].pnt. We examine Y[Mi(F[i]+x)-C[b].ds-C[b].dt-l] and C [6] .rank 
to determine whether a is connected to b. Notice that we must correct for 
the fact that the two consecutive antichains, C/j and CA+i, that contain 
Di are not part of the set H on line 9. 

□ 



Briefly, we can use word-level parallelism, since Y[Mt (F[j] + y)] fits in O(l) words. 
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